What are "pivotal acts"?

4 min read

Suggest changes in Google Docs

A “pivotal act” typically refers to an act that uses powerful AI to take some unilateral action with the aim of significantly reducing existential risk from AI, usually an action that would significantly delay AI progress or limit the ability of actors to build dangerous AI. An example of such an act is using AI to systematically destroy important computing hardware necessary for further development.

The term was originally coined by Eliezer Yudkowsky to mean something more specific. He defines it as a “hypothetical action that will make a large positive difference a billion years later” in contrast with existential catastrophes, defined as events that make a large negative difference a billion years later, and he stresses that he intended to specify an act that involves the use of powerful AI. In typical use, a human attack on a single AI lab might constitute a pivotal act, while it would not under Eliezer’s definition.

Pivotal acts were proposed as a way for researchers to buy sufficient time to completely solve AI alignment. The hope is that it might be easier to align an AI that could take this limited action than it is to solve the complete alignment problem. Yudkowsky has stressed he intended to define a pivotal act as an act that involves the use of powerful AI .

The problem of designing an AI to carry out a minimal pivotal act can be viewed as a limited formulation of the alignment problem: is it possible to give precise enough instructions to an AI powerful enough to do something (without unwanted side effects) which would actually prevent other people from deploying an unaligned AI?

When MIRI researchers talk about this problem, they often use the “strawberry task” as an example of the level of power needed for a pivotal act. The strawberry task involves producing two strawberries that are identical at the cellular level and then ceasing all action. If we had an alignment technique which could reliably get an AI to achieve this task with no unwanted side effects, then that AI could plausibly be used for a pivotal act.

The key here is that you want to build a system that is:

aligned so well that it does exactly what you want it to do;
aligned so well that it doesn't do anything you don't want it to do;
powerful enough to do something sufficiently complex to be impactful (but obviously not so powerful that alignment is intractable).

Pivotal acts are, by their nature, quite impactful and often outside of the overton window. Any process powerful enough to enact a pivotal act would be dangerous if unaligned.

For a critical view, Andrew Critch argues against this strategy of designing an AI to take a unilateral “pivotal act” since it will lead to distrust, increase conflict and fuel the race between different AI labs. He advocates for a collaborative pivotal process instead.

Why can't we just use a friendly AI to stop bad AIs?